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Reconstructing the sky location of gravitational-wave detected compact 
binary systems: Methodology for testing and comparison 

T. Sidery,*'* B. Aylott/ N. Christensen,^ B. Farr,^’^ W. Farr,^’* F. Feroz,"^ J. Gair,^ K. Grover,* P. Graff,^ 

C. Hanna,^ V. Kalogera,^ I. Mandel,* R. O’Shaughnessy,* M. Pitkin,^ L. Price,*** V. Raymond,*** C. Rover,**’*^ 
L. Singer,*** M. van der Sluys,*^ R. J. E. Smith,* A. Vecchio,* J. Veitch,*"* and S. Vitale*^ 

^School of Physics and Astronomy, University of Birmingham, Birmingham B15 2TT, United Kingdom 
^Physics and Astronomy, Carleton College, Northfield, Minnesota 55057, USA 
^Department of Physics and Astronomy, Center for Interdisciplinary Exploration and Research in 
Astrophysics (CIERA), Northwestern University, Evanston, Illinois 60208, USA 
^Cavendish Laboratory, University of Cambridge, Cambridge CBS ONE, United Kingdom 
^Institute of Astronomy, University of Cambridge, Cambridge CBS OHA, United Kingdom 
^NASA Goddard Space Flight Center, Greenbelt, Maryland 20771, USA 
^ Perimeter Institute for Theoretical Physics, Waterloo, Ontario N2L 2Y5, Canada 
^Center for Gravitation and Cosmology, University of Wisconsin-Milwaukee, 

Milwaukee, Wisconsin 5S2II, USA 
‘^SUPA, School of Physics and Astronomy, University of Glasgow, 

University Avenue, Glasgow GI2 8QQ, United Kingdom 
^^LIGO, California Institute of Technology, Pasadena, California 91 125, USA 
* ^ Max-Planck-Institut fiir Gravitationsphysik (Albert-Einstein-Institut), 

Callinsrafia S8, S0I67 Hannover, Germany 

^^Department of Medical Statistics, University Medical Center, Gottingen, S707S Gottingen, Germany 
^^Radboud University Nijmegen, P.O. Box 9010, NL-6500 GL Nijmegen, The Netherlands 
^^Nikhef Science Park 105, Amsterdam 1098 XG, The Netherlands 
'^^Massachusetts Institute of Technology, 185 Albany Street, Cambridge, Massachusetts 02139, USA 
(Received 19 December 2013; published 18 April 2014) 

The problem of reconstmcting the sky position of compact binary coalescences detected via 
gravitational waves is a central one for future observations with the ground-based network of 
gravitational-wave laser interferometers, such as Advanced LIGO and Advanced Virgo. Different 
techniques for sky localization have been independently developed. They can be divided in two broad 
categories: fully coherent Bayesian techniques, which are high latency and aimed at in-depth studies of all 
the parameters of a source, including sky position, and “triangulation-based” techniques, which exploit the 
data products from the search stage of the analysis to provide an almost real-time approximation of the 
posterior probability density function of the sky location of a detection candidate. These techniques have 
previously been applied to data collected during the last science mns of gravitational-wave detectors 
operating in the so-called initial configuration. Here, we develop and analyze methods for assessing the self 
consistency of parameter estimation methods and carrying out fair comparisons between different 
algorithms, addressing issues of efficiency and optimality. These methods are general, and can be applied 
to parameter estimation problems other than sky localization. We apply these methods to two existing sky 
localization techniques representing the two above-mentioned categories, using a set of simulated inspiral- 
only signals from compact binary systems with a total mass of < 20Mq and nonspinning components. We 
compare the relative advantages and costs of the two techniques and show that sky location uncertainties 
are on average a factor «20 smaller for fully coherent techniques than for the specific variant of the 
triangulation-based technique used during the last science runs, at the expense of a factor «1000 longer 
processing time. 
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I. INTRODUCTION 

Ground-based gravitational-wave (GW) laser 
interferometers — ^Laser Interferometer Gravitational Wave 
Observatory (LIGO) [1], Virgo [2] and GEO-600 [3] — ^have 
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completed science observations in 2010 (S6WSR2-3) [4] in 
the so-called initial configuration, and are currently being 
upgraded with the plan to start running again from 2015 at a 
significantly improved sensitivity [5,6]. No detection was 
achieved during this initial period of observations; how- 
ever, the expectations are that by the time the instruments 
reach design “advanced” sensitivity they shall routinely 
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detect gravitational-wave signals. One of the most prom- 
ising candidate sources for detection are coalescing binary 
systems of compact objects containing neutron stars and 
black holes [7]. 

One of the key pieces of information to extract is the 
source location in the sky. Once a detection candidate is 
identified by search pipelines, the location parameters that 
describe the source are reconstructed using a number of 
techniques, both high and low latency [8,9]. In contrast to 
traditional telescopes, gravitational-wave instruments are 
all-sky monitors and the source location in the sky is 
reconstructed a posteriori. Information about the source 
geometry is primarily encoded in the relative time of arrival 
of GW radiation at the different detector sites, together with 
the relative amplitude and phase of the GWs as seen in 
different detectors. Constraining the source location on the 
sky will be an important element of the analysis because it 
allows for follow-ups of the relevant portion of the sky with 
electromagnetic instruments, possibly over a wide spectral 
range, and could offer information about the environment 
of a GW-detected binary [10-12]. The electromagnetic 
signatures associated with the merger of the compact objects 
are expected to be transient, so the time scale over which 
the sky location information becomes available from the 
gravitational-wave ground-based network is also important. 

For this reason the problem of reconstructing the sky 
position of GW sources with the network of ground-based 
laser interferometers is an area of active work in preparation 
for advanced instruments [ 1 3-1 8] . By the end of observations 
with instruments in initial configuration, two main imple- 
mentations had been used to determine the sky localization 
uncertainty region of a coalescing binary candidate [8,9]: 

(i) LALInference [19], a hbrary of fully coherent 
Bayesian analysis algorithms, computes the posterior 
probability density function (PDF) on the sky location 
and other parameters on the time scale of hours to 
several weeks, depending on the specific signal. Using 
two classes of stochastic sampling techniques, 
Markov-Chain Monte Carlo [20-22] and nested sam- 
pling [23-25], LALInference coherently analyzes the 
data from all the interferometers in the network and 
generates the multidimensional PDF on the full set of 
parameters needed to describe a binary system before 
marginalizing over all parameters other than the sky 
location (a binary in circular orbit is described by 9 
to 15 parameters, depending on whether spins of the 
binary components are included in the model). 

(ii) A much faster low-latency technique, that we will 
call Timing-i-i- [8], uses data products from the 
search stage of the analysis, and can construct sky 
maps on (sub)minute time scales by using primarily 
time-delay information between different detector 
sites. In particular, the masses, time and phase of 
arrival, and the amplitude of the signal are searched 
for in each detector separately and the masses and 


time of arrival are checked for consistency [26]. The 
time of arrival and amplitude of the signal in each 
detector are the intermediate data products used by 
Timing-i-i- to construct the PDF of the sky location. 
These two approaches were initially designed to serve 
different purposes: a thorough parameter reconstruction 
and a low-latency sky locahzation technique, trading off 
accuracy for computational speed. 

The goal of this paper is twofold. Several parameter 
estimation approaches have been, and continue to be, 
developed in preparation of the advanced instruments 
coming online in 2015. Algorithms may be tuned in specific 
ways to serve different purposes. The first goal of this paper 
is to provide fair and rigorous methods to compare different 
approaches in order to inform future developments. One of 
the most actively investigated parameter estimation aspects 
is sky localization reconstruction. It is therefore natural to 
apply these comparison methods to the algorithms used up to 
now to check the consistency of the results, quantify relative 
benefits and identify the areas that need the most attention in 
the future. The second goal of this paper is to provide the first 
rigorous comparison of the two sky localization techniques 
described above. We examine the sky location PDFs for a 
large number of simulated signals from coalescing compact 
binaries with total masses of up to 20 Mq in simulated 
stationary, Gaussian noise. Although our signal distribution 
is not astrophysically motivated, it allows us to statistically 
examine the self consistency of both techniques by testing 
whether the claimed uncertainty regions match the actual 
probability that the source is found at those sky locations. 
Furthermore, by comparing the uncertainties in sky location 
across the code outputs we gain an understanding of the 
systematic behavior of each technique. Many of these 
comparison methods have now become the routine test bed 
in the development effort for gravitational-wave data analysis 
and may have applicability in other areas of astronomy. 

The paper is organized as follows. In Sec. II we describe 
two techniques used to determine the sky location of a 
candidate coalescing compact binary. In Sec. Ill, we 
evaluate the correctness of the two techniques using a 
simulated population of binaries over a wide range of 
the parameter space, comparing their sky localization 
capabilities and latency time scales. Sec. IV contains our 
conclusions and pointers to future work. 

II. LOCATION RECONSTRUCTION METHODS 

Gravitational-wave interferometers are, by design, 
sensitive to sources across much of the sky. Because of 
this, position reconstruction estimates rely largely on time 
delays between sites in a multiple detector network, i.e., 
triangulation. Using only time-delay information, there is 
generally a degeneracy in the position reconstructed. For a 
two-detector network, this degeneracy is a conical surface 
of constant time delay around the line connecting the two 
detectors, whose projection onto the sky plane yields a ring. 
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For a three-detector network, this degeneracy is broken into 
two regions symmetric about the plane defined by the 
detectors: the intersections of two rings on the sky. A four- 
(or more) detector network will generally identify a single 
region in the sky. However, time delays are not the only 
source of sky location information. Though the observed 
amplitude of gravitational waves depends only weakly on 
the source location, it typically helps to break these 
degeneracies in two and three detector networks; further 
information is contained in the relative phasing between 
detectors [18]. In this section we outline the two methods 
considered so far for position reconstruction. 

We can formalize the problem we want to address as 
follows. The data, 

dj{t) = tij{t) + hj{f,0), (1) 

from each gravitational- wave interferometer in the network 
j = 1 , where N is the number of instruments^ is a 

sum of the noise rij{t) and any signal hj{t\ 6), where 0 is a 
vector that describes the set of unknown parameters that 
characterize the GW source. For this study we consider 
coalescing binaries of compact objects with approximately 
circular orbits and negligible spins; 0 is a nine-dimensional 
parameter vector: two mass parameters (the two component 
masses 2 , or an alternative combination of these, e.g., 
the symmetric mass ratio r] = -f and the 

chirp mass M. = mj)), the distance to the 

source D, the source location in the sky (described by 
two angles that identify the unit vector Q, e.g., right 
ascension a and declination S), the orientation of the binary 
(polarization i/r and inclination of the orbital plane ;) and the 
reference phase tpQ and time tg. To simplify notation, we 
define 


parameters 0. The evidence p(d) is used to normalize 
the integral of the posterior over the entire parameter space 
to unity. 

A. LALInference 

The evaluation of p(0\d) is notoriously difficult in 
high-dimensional problems with complex likelihood func- 
tions, as is the case for coalescing compact binaries in a 
network of laser interferometers. We have developed a 
set of sampling algorithms within the LSC Algorithms 
Library (LAL) [27], collected under LALInference [19], 
specifically for the analysis of gravitational-wave data, and 
for what is relevant here, coalescing binary signal models. 
The library contains two main stochastic parameter- 
space exploration techniques: Markov-Chain Monte Carlo 
(lalinference_mcmc [22]), and nested sampling 
(lalinference_nest [24] and lalinference_bambi 
[28]). Different algorithms are included to validate results 
during the development stage and to explore a range of 
schemes to optimize the run time. These techniques have 
been used to analyze a set of hardware and software 
injections as well as detection candidates during the last 
LIGOWirgo science runs [9]; a technical description of the 
algorithms will be reported elsewhere [19]. 

The output of a LALIneerence run is a list of 
“samples,” values of 0 drawn from the joint posterior 
probability density function. The density of samples in a 
region of parameter space is proportional to the value of the 
PDF. For the specific sky localization problem we are 
considering here, the marginalized posterior probability 
density function on the sky location is simply 

p{ii\d) = J p{Q,mdA ( 4 ) 


0={^,h^ (2) 

where p is the parameter vector that does not contain the 
sky location parameters, right ascension and declination. 
Regardless of the specific technique that one decides to 
adopt, the goal is to evaluate p{Q.\d), the marginalized joint 
posterior density function of the sky location parameters 
given the observations. 

A straightforward application of Bayes’ theorem allows 
us to calculate the posterior probability density for a model 
with parameters 0 given the data, d, using 


p{0\d) 


P{d\0)p{0) 

P(d) 


(3) 


The prior probability density, p{0), encapsulates all our 
a priori information about the expected distribution of 
sources in distance, masses or other parameters in the 
model. The likelihood p{d\0) is the probability of gen- 
erating the data set d given an assumed signal with 


where p{Q.,P\d) = p{0\d) is derived using Eq. (3). If we 
could extract an infinite number of samples then we would 
be able to map out the PDF perfectly; however, these are 
computationally intensive algorithms, see Sec. HID for 
more details, and we typically have ~1000 independent 
samples. The finite number of samples can introduce both 
stochastic and systematic bias, and so we have imple- 
mented a two-step kD-tree binning process to estimate the 
PDF that removes the systematic issues [29]. 

The fully coherent Bayesian analysis takes into account 
the search stage of the analysis only to set the prior range for 
the arrival time of a gravitational wave around the observed 
detection candidate. However, the matched-filtering stage of 
a search already offers processed information that can be 
used to generate approximate posterior density functions 
p{Q\d). This is the approach taken in Timing-i-i-. 

B. T1MING++ 

T 1 MING- 1 -+ [8] takes the parameters of the waveform that 
best fit the data in each detector, as found by the initial 
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search [26], and assumes that the posterior of interest is 
only a function of the arrival times in each detector, and 
the amplitude of the signal in each detector, . That is, we 
write 


(5) 

where Q is the location on the sky. We further assume that 
the information in the arrival times and amplitudes can each 
he replaced hy a single quantity so that 


consistency between detectors is used. The starting point is 
the fact that 


Pi a 


D 


(0 ’ 
eff 


(9) 


where Dgff is an effective distance, defined hy 


£>eff — ^ 


1 + cos'll 


2 ,\ 2 


Fzcos^i 


- 1/2 


(10) 


a/(A?rss.sc(^), AA^ 

— y(A?rss,sc? AAj-ss), (6) 

where /(Ahss AA^sJ is an empirically derived distrihu- 
tion function and A^ss.sc ^nd AA^-^s are described in the 
following. For a source at position Q, the arrival time at 
detector i allows us to predict the arrival time at any other 
fiducial point, which, for the sake of simplicity, we choose 
to be the geocenter. In the absence of noise, the predicted 
geocentric arrival times, computed separately from each 
detector’s measured arrival time, should coincide. The 
summed squared differences of the predicted arrival times 
at the geocenter between detector pairs give us a measure of 
how far we expect to be from the true location: 


and X = V^) are the antenna beam pattern 

functions; see Eqs. B9 and BIO of Ref. [31]. While the 
matched filter detection pipeline produces an estimate of 
Dgff separately in each detector, it is not invertible to obtain 
any of the quantities in Eq. (10) directly. With that in mind, 
we define 


A2=. 


1 


Fl(a,y/ = 0) + Fl{a,y/ = 0) 


(11) 


and use 


AA. 




l>] 


D . 


(02 A{i)2 _ A{j)2\2 

eff -^eff _ ^ ^ ) 


( 12 ) 


Ahss = - 42o(^)) - (4if - 4eo(^)))4 (7) 

y <>i 

where 4eo(i2) is the difference between the arrival time of 
a signal from Q at detector (i) and at the geocenter, and 
is the time the signal crosses a reference frequency in 
the band of detector i. This vanishes in the idealized case of 
no noise for the true location. By appropriately choosing 
the reference frequency we minimize the correlation 
between the determined mass and phase in the waveform 
and the recovered time of arrival [30]. This is important 
since the parameters of the waveform are determined 
separately in each detector. Moreover, we expect that these 
errors in timing will scale inversely with the signal-to-noise 
ratio (SNR) of the system in the high-SNR regime: 

10 

Atrss Atj-ss sc ’ (^) 

P 

where p = \/'f2,ip] is the combined SNR, p, is the SNR 
measured in detector i, and the factor of 10 is chosen as a 
fiducial SNR. We use the SNR-corrected Ahss.sc place of 
Atj-ss to remove this dependence on SNR. 

Incorporating the amplitude of the signal is more com- 
plicated. The SNR is a function not only of sky location but 
also of luminosity distance, inchnation and polarization of 
the signal. Because this method is designed for low-latency 
sky localization, a somewhat ad hoc measure of amplitude 


as a measure of the consistency of the calculated and 
observed difference in response functions between each 
detector pair. In contrast to Eq. (7), this quantity is typically 
not zero in the absence of noise as A^ = D^f^/D only when 
inclination and polarization are both 0. However, the use of 
amplitude reconstruction in this manner has been deter- 
mined empirically to improve position reconstruction 
estimates. In contrast to there is no adjustment 

for SNR in AA^jg. Grover et al. [18] showed that phase 
consistency between detectors can provide additional 
information on sky location and significantly reduce sky 
localization uncertainty; however, phase consistency was 
not included in Timing-i-i-. 

Putting together our previous assumptions, 

p{a\d) K p(G|f(9,A(')) 

a p{n)f{At 

rss,sc7 AAfss) 

«p(Q)A(Ah,,,,,)/^(AA,,,), (13) 

where p{Q.) is the prior on the sky location and in the third 
line we have assumed that /(Afj-ss AA^-^s) can be written 
as the product of two other empirical distributions, 
/,( Aljss sc) and f^ ( AA^ss) ■ In this work we assume isotropic 
priors on the sky location. In the low-latency search for 
compact binaries and associated electromagnetic counter- 
parts for which Timing-i-i- was designed, a restrictive prior 
that limited consideration to only areas of the sky 
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containing galaxies was imposed, as described in [8]. In 
practice, fd^t^ss.sc) ^nd measured before- 
hand using simulations, where and are 

computed from the recovered arrival times and effective 
distances, respectively, and the true (known) sky location, 
This amounts to evaluating g;, (AA^js) according 
to Eq. (7) [Eq. (12)] at Q,j.ue using the time of arrival 
(effective distance) from the matched filter pipeline. 
A kernel density estimator is then used to estimate the 
distribution of these quantities. When a candidate is found, 
Atrss sc ^nd AAj-ss are computed across a fixed grid on the 
sky, and the likelihood is taken from the previously 
simulated distributions and the result is normalized, leading 
to an inherently fast method. 

III. TESTING 

The goal of this study is to compare the relative 
performances in terms of sky localization of Timing-i-i- 
and EAEInference and in doing so to develop a set of 
criteria and general tools that can be applied to many 
parameter estimation problems in which different tech- 
niques are considered. The tests should ensure that each 
algorithm separately is self consistent, and then provide fair 
methods of making comparisons. 

Eor the specific problem considered in this paper, 
Timing-i-i- and EAEInference both evaluate the posterior 
probability density function p(Q|ri); see Eqs. (4) and (13). 
Eor a given model assumption and data realization, there 
is an exact PDE of which the algorithms produce an 
approximation. There are many effects that can distort 
the recovered PDE from the true one. They can be grouped 
in two broad categories. 

Irrespective of the algorithm that is used, the assump- 
tions on the elements that enter the PDE calculation may 
differ from the actual problem, and therefore produce a bias 
in the results. Eor the problem at hand, they can be 
summarized as follows: (i) the model waveform family 
does not describe the actual signal contained in the data; 
(ii) the noise model is incorrect; and (iii) the choice of 
priors does not match the actual ones that describe the 
problem, and, in the specific case considered here, the 
priors from which the source parameters have been drawn. 
Each of these enter the calculation of the PDE; see Eq. (3). 
In the test described here, the signal model (the waveform 
family) is exactly known, and the same waveform family is 
used for the signal generation and the likelihood calcu- 
lation. The statistical properties of the noise — Gaussian and 
stationary drawn from a known distribution — are also 
known. It is, however, important to emphasize that in 
the case of EALIneerence the noise power spectral density 
(PSD) is estimated from the data surrounding the signal, 
and as a consequence it does not exactly describe the 
distribution from which the noise is drawn. Eor the 
Timing-i-i- analysis, on the other hand, the noise PSD is taken 
to be exactly the one used to generate the noise realizations. 
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A different set of effects that can affect the recovered 
PDE are more fundamentally intrinsic to the algorithms: 
(i) the assumptions that go into the likelihood calculation 
are not perfect, (ii) there are algorithmic issues that produce 
errors, and (iii) PDEs cannot be reconstructed perfectly 
from a finite number of samples (postprocessing). 
The likelihood calculation makes assumptions about the 
form of the noise and so is linked to the previously 
mentioned noise issue. Eor Timing-i-i-, the likelihood is 
calculated using a mix of approximations and simulated 
runs. This is a point of possible bias entering the results of 
the Timing-i-i- runs; measuring its extent is part of our 
investigation. 

As well as the obvious statement that the algorithm must 
be working correctly, it was found with EAEIneerence 
that the way that the results are processed to create a 
continuous PDE from discrete samples from the posterior 
can also introduce noticeable distortions. This is linked to 
the finite sampling issues mentioned previously and fixed 
with two-stage kD-trees [29]. 

While in theory the sources of bias due to the test itself 
are straightforward to control, any erroneous results may 
either be due to code issues or a failure to properly treat the 
setup issues, both of which may give very similar dis- 
tortions in the final PDE. This leads to a cycle of code 
checking and test setup checking while codes are being 
developed. This is particularly true of the EALIneerence- 
type algorithms that, with the correct setup, should pre- 
cisely recover the PDE, creating a stringent checking 
mechanism for the codes’ self consistency. 

A. Test population 

To set up a rigorous comparison test bed we have 
considered 360 mock inspirahing compact binary signals 
from a population of binary sources and “injected” the 
waveforms into Gaussian and stationary noise representing 
observations with the three-detector network consisting of 
the two EIGO detectors at Hanford, Washington and 
Eivingston, Eouisiana and the Virgo detector near Pisa, 
Italy. The power spectrum of the noise was chosen to mimic 
the EIGO sensitivity achieved during the last science run 
[4] and was the same for ah the instruments of the network, 
including Virgo. A subset of this population has been 
recently used for other parameter estimation studies; 
see Refs. [18,32]. The noise data were generated with 
the infrastructure used for the NINJA-2 project [33]. The 
low-frequency cutoff was set to 40 Hz. 

The source distribution was chosen to test these two 
sky localization approaches over a large range of signal-to- 
noise ratios and physical parameters that describe stellar 
mass binary systems, rather than being astrophysically 
motivated. The mass distribution was uniform in compo- 
nent masses with IMq < 2 < 15Mq and a cutoff on the 

total mass + m 2 < 20Mq. The sky position and ori- 
entation of the systems with respect to the interferometers 
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RA (rad) 

EIG. I (color online). An output PDE of the sky position from 
the two codes. The contour lines label the 50% and 90% credible 
regions for Timing++ while the light and dark shaded regions 
show the 50% and 90% credible regions, respectively, for 
LALInference. The star indicates the source location. 

were distributed uniformly. For distances between 10 and 
40 Mpc tbe logarithm of the distance was uniformly 
distributed in order to give a broad range of network 
SNRs above the detection threshold. 

The waveforms used to generate, and then analyze, the 
signal are restricted post-Newtonian approximations of the 
inspiral phase, with spins of the binary components set to 
zero. The time-domain TaylorTS and TaylorT4 approxim- 
ants of the LAL [27] at second post-Newtonian order in 
phase, in which the differential equations that describe the 
evolution of a characteristic orbital velocity and phase of 
the system are Taylor expanded in terms of the character- 
istic velocity of the two inspiralling objects [34], were used 
for Timing-i-i- and LALInference, respectively. The pre- 
cise forms of the two families of waveforms have phase 
differences from post^'^ -Newtonian order and above, which 
has no effect for the purpose of these comparisons; the 
crucial factor for these tests was that each code used 
the same waveform family for injection and subsequent 
recovery of the signal. It was necessary to use different 
waveforms in each code due to compatibility issues of the 
implementations . 

The synthetic data containing GW signals added to noise 
were processed using the standard matched-filter search 
pipeline ihope [26] used in the LIGOWirgo analyses in this 
parameter range; see, e.g.. Ref. [35] and references therein. 
LALInference was run on all the 360 injections, with a flat 
prior on the time of arrival over a range of ± 1 00 milliseconds 
around the time of the injection. Timing-h- uses an addi- 
tional criterion that the SNR must be greater than 5.5 in each 
of three detectors; 243 candidates passed this cut. Figure 1 
gives an example output PDF from one of the runs. For the 
self-consistency tests described in Sec. Ill B we used all the 
results available for each algorithm. For the comparisons 


between the codes in Sec. Ill C, we only used those data sets 
for which results from both methods are available. 

B. Self-consistency checks 

We describe the PDF via credible levels (CL): the 
integrated probability, in our case p(Q.\d), over a given 
region of the parameter space. In particular we consider the 
smallest region, or minimum credible region (CR^iin), for a 
given CL; in our case, this corresponds to the smallest 
region in the sky that contains the given probability that the 
source is in that location. More formally, for a given CL, 
any credible region (CR) must satisfy 

CL= / p{Q\d)dQ. (14) 

JCR 

We can then find the smallest region such that this still 
holds, which we call CR^jn. By considering the full range 
of probabilities we can map out the PDF with a set of 
contours that bound each CRn,j„. 

While the analysis of a single GW signal will not tell us 
very much about the correctness of the analysis, consid- 
ering how CL and CR,jiin are related over a large number of 
GW signals gives us statistical information: Does a given 
credible level really correspond to the probability of finding 
the source in that location? For each run and a given CL 
we can check if the injection’s parameter coordinates fall 
within the associated CR^ji^; if there are no sources of bias 
in the analysis, this should happen with probability CL in 
order for the credible regions to be meaningful. We can plot 
a cumulative figure, over all injected signals and the full 
range of CLs, of the proportion of injections found within a 
given CL’s CR^^jn. We expect this to be diagonal, up to 
statistical fluctuations arising from a finite number of 
injections. Deviations from the diagonal indicate that the 
parameter estimation algorithm does not correctly evaluate 
the PDF, or other sources of bias are present, e.g., the priors 
used in the analysis do not match the distribution of the 
injected source population. 

The results of this test from all the signals detected out of 
the 360 injections in each of LALIneerence and 
Timing-h- is shown in Fig. 2. The error bars are calculated 
from the expected variance in the number of injections that 
fall within a given CR. For a CL of p, and n runs, the variance 
on the number of sources found within CR„,in is np (I — p)ii 
the fraction of injections that fall within a given CRmin is 
really described by the binomial distribution, as expected. 
The error bars on the fraction of injections found within a 
given CRmin are given by the standard deviation normalized 
by the number of runs, p{\ — p)/n. 

We can see here that LALIneerence produces results 
that indeed follow the expected relation; we can therefore 
conclude that the algorithm is self consistent. During the 
LALIneerence development, parallel to this investigation, 
this test was used as one of the primary tools to check the 
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FIG. 2 (color online). For each CL we plot the number of 
injections that fall within the associated minimum credible region 
CRmin for ^11 Iho signals analyzed with LALInference, bottom 
(red) curve, and Timing++, top (green) curve. The error bars 
correspond to the binomial error; see text for more details. A self- 
consistent algorithm gives results that lie along the diagonal line 
of this plot. Results that fall above the expected line, as is the case 
for T1MING++, highlight an algorithm that is overcautious in its 
estimation of CR^j^. 

algorithm. As well as checking sky location, this test was 
done in each of the model parameters separately, though 
rather than using the minimal CL it is easier and sufficient 
to use a connected credible region whose lower hound is 
the lowest value of the parameter being investigated. 

On the other hand, the results obtained with T1MING++ 
show a significant deviation from the expected behavior: 
the calculated CRs for T1MING++ do not represent the 
“true” CL. As the results are above the expected behavior, 
the sky regions are too large. This shows that T1MING++ is 
not “self-consistent.” This is not necessarily unexpected 
because Timing-i-i- is purposefully an approximation in 
favor of speed; it is useful to note that Timing-i-i- is over 
conservative. 

From these results it also follows that we need to be 
cautious when designing comparisons between Timing-i-i- 
and LALInference applied to the same GW signal. We 
consider these comparisons in the next section. 

C. Comparisons 

We can now turn to comparisons between Timing-i-i- and 
LALIneerence, and we consider two different figures of 
merit for this. 

For a self-consistent code, the CR^jn of a chosen CL is a 
natural metric of the ability of the algorithm to localize the 
source. This is equivalent to stating the expected smallest 
region of the sky that needs to be scanned by a follow-up 


observation to have a given probability that the actual 
source location is covered. Here, we will consider the 50% 
minimum credible region, and therefore set CL = 0.5. 
While this is natural for the fully coherent Bayesian codes, 
the same is not true of Timing-i-i-. We saw in the previous 
section that Timing-i-i- is not self consistent: it does not 
provide the correct CRs at a given CL but actually over- 
states it. 

It is, however, still interesting and possible to know the 
size of the CR^^in that relates to the true CL. From the self- 
consistency test we have a relation between the output CRs 
and the true CLs from Timing-i-i-. This means we can 
compare the output areas of the minimal credible regions of 
the true 50% CL by using the quoted 23% CR^^jn from 
Timing-i-i- and the 50% from LALInference. In other 
words, we are correcting for the lack of self consistency of 
Timing-i-i- and can produce a fair comparison of the two 
methods. 

Figure 3 shows the fraction of signals whose 50% 
CR„jinS were smaller than a given area. We can see that 
even after the corrections to the CLs are implemented, 
Timing-i-i- gives significantly larger CR^^jnS. This happens 
because the PDFs returned from Timing-i-i- are not quite the 
same shape as the “correct” PDFs that LALInference is 
returning; the differences are not simply a rescaling of the 
width of the peak. 

While this test was quite natural from the Bayesian 
framework point of view, another piece of information that 
would be passed to follow-up telescopes would be a list of 
the most likely “pixels” on the sky. One can easily consider 
a follow-up strategy in which these tiles are observed by 
telescopes in order, until a possible counterpart of the GW- 
detected source is imaged (or one runs out of pointings). 
This searched area is equivalent to the size of the CR^jn 



FIG. 3 (color online). The fraction of detected signals whose 
associated tme (corrected) 50% CRn,in covers less than a given 
area on the sky. We can see that LALInference gives much 
tighter constraints than Timing-h- on the location of a source. 
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EIG. 4 (color online). The fraction of sources where the 
injection would have been imaged after searching less than the 
given area in a telescope greedy algorithm. 

whose boundary passes through the source’s true location 
on the sky. Furthermore, by considering this area for both 
approaches we bypass the need to correct for the true 
relation between probability and CL. Figure 4 shows the 
fraction of sources that would be imaged after only 
the given area is searched over, for each source, using 
the CRminS as discussed above. We can see that there is a 
significant difference between the two sky localization 
approaches; for example, 76% of sources would be found 
after searching 20 deg^ if we followed the output of 
LALInference, whereas we would only have found 
38% of the injections by following Timing++. 

To gain a better feel for the difference in the calculated 
areas for the two methods, we compared the areas injection 
by injection. We plot the areas of the true (corrected) 50% 
CR found by each code where the injections are sorted by 
SNR (Fig. 5). For the LALIneerence results we can see 
the expected scaling of the area oc 1/ SNR^. We also plot the 
ratio of the 50% CR^jn areas determined by the two codes 
in Fig. 6. We can see that there is significant spread around 
the typical factor of 20 difference between the calculated 
CRnjin areas. 

These results should not be taken as a statement on 
the expected sky localization accuracy as the underlying 
injection distribution is not astrophysical. The set of injec- 
tions was chosen to test and compare the codes over a wide 
region of parameter space and should be treated as such. 

D. Run time 

Timing-i-i- has been set up with speed in mind and so the 
run time to extract the sky location after data is received is 
on the order of minutes [8]. Prior to the analysis, the 
distributions s(.|^) ^nd p(AAjss|Q) need to be 

generated, and this is done with large scale simulations. 
Despite being computationally expensive — the simulations 



FIG. 5 (color online). The sky area of the 50% tme (corrected) 
minimum credible region for each of the sources as a function of 
the optimal network SNR of the signal. While there is some 
scatter, the areas from EALInference [solid (red) dots] scale as 
oc 1 /SNR^, as one would expect, while the areas from Timing++ 
[open (green) circles] are closer to oc 1/SNR. 

require on the order of days to weeks — this step is done 
prior to the actual analysis and therefore has no impact on 
the latency of the online analysis. 

While considering code speed, we need to specify the 
specific sampler used in LALInference. Here, we report 
results for lalineerence_mcmc, the sampling method 
that was used for this study. A comparison between different 
samplers in LALIneerence will be reported elsewhere. 



FIG. 6 (color online). The ratio of recovered areas of the 50% 
tme (corrected) CRs using LAEInference as the baseline. While 
there is some scatter, LALInference is consistently producing 
smaller areas than Timing++ by a factor which is roughly 10 for 
low SNRs and approximately scales with SNR. 
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FIG. 7 (color online). The cumulative distribution of wall times 
for LALINFERENCE_MCMC to output a new independent sample 
across the runs performed to generate the results reported in this 
paper. With 10 cores used for each run, CPU times were a factor 
of 10 larger. 

There are two main metrics of computational cost that we 
consider here: the so-called “wall time” (the time an analysis 
joh takes from start to finish), and the total processing (CPU) 
time. LALINFERENCE_MCMC is designed to take advantage of 
multiple cores and mns in parallel on different processors. 
The parallel chains explore likelihoods at different contrast 
levels (“temperatures”). We find that roughly 10 chains are 
optimal for improving sampling and convergence for the 
data sets considered in this study; therefore, CPU times are a 
factor of ten larger than wall times. 

The important quantity to report for LALInference is 
the time required to output a new independent sample of the 
posterior PDF. The precise number of samples that we 
deem necessary to describe the PDF is a balance between 
speed and precision; as mentioned earlier, finite sample- 
size issues are a concern for postprocessing, and we have 
found that we require at least 1000 independent samples. 

In Fig. 7 we show the fraction of the analysis runs that 
output a single independent sample within a given wall 
time. This quantity was derived by dividing the total wall 
time of each injection run by the number of independent 
samples generated in that run. From this graph we can see 
that 90% of the runs had output 1000 independent samples 
in ~14 hours of wall time. The runs were done on nodes 
composed of Intel Nehalem E5520 processors (2.26 GHz) 
with Infiniband double data rate interconnects. 

IV. DISCUSSION 

In this paper we have considered two sky localization 
algorithms, LALInference and Timing+-i-, used during 
the final science run of the LIGO and Virgo instruments in 
initial configuration. Our goal was to assess the relative 
benefits and costs of the two approaches, and to develop a 


strategy as well as practical tools to evaluate the consis- 
tency of the results and inform the future direction of 
development. We are now applying these tools to a number 
of parameter estimation research projects. 

Lor the study presented in this paper we have considered a 
synthetic data set representing a three-detector network. GW 
signals generated during the inspiral phase of the coales- 
cence of binary systems with a total mass smaller than 20 Mq 
and nonspinning components were added to Gaussian and 
stationary noise representative of the sensitivity of initial 
LIGO. We have chosen the range of source parameters in 
order to best explore the performance of the algorithms. This 
is important for testing purposes, but one cannot draw 
conclusions about the actual performance of the GW instru- 
ments in future observations from these simulations. To 
address that question, one would need to consider an 
astrophysically motivated population of sources, e.g., bina- 
ries distributed uniformly in volume, and then consider sky 
localization only for those signals that pass a detection 
threshold of the search pipeline. 

As discussed in Sec. Ill, posteriors can be systemati- 
cally biased because of incorrect models, inaccurate 
priors, insufficient sampling or improper postprocessing 
to estimate credible regions. 

Incorrect models are always a concern in parameter 
estimation. Our likelihood model, p{d\6,H), could be 
incorrect because of inaccuracies in the waveform models, 
noise models or calibration errors. Waveforms may not 
include certain features (e.g., in this study, we did not allow 
for spinning binary components) or are affected by limi- 
tations in the accuracy of waveform models; efforts are 
under way to develop more accurate and complete models 
[36,37] and to account for waveform uncertainty directly in 
parameter estimation. Real detector noise is neither sta- 
tionary nor Gaussian; promising strides have been made in 
accounting for noise nonstationarity [38], shifts in spectral 
lines and even glitches in the noise. The impact of 
calibration errors on parameter estimation was analyzed 
in the context of advanced detectors [39]. In this study, our 
models were correct by construction, as we used stationary, 
Gaussian noise, assumed perfect calibrations and employed 
the same waveform families for injections and templates. 

In this paper, we explicitly made sure that the priors 
assumed by LALInference were identical to the injection 
distribution to guarantee that inaccurate priors did not 
introduce a bias in the results, and our code development 
efforts and thorough testing ensured that insufficient 
sampling was not a concern. 

We did find early in our studies that our initial approach 
to postprocessing could lead to systematically understated 
posterior credible regions. We addressed this by developing 
a more sophisticated postprocessing procedure (see below 
and [29]). 

There is an important difference between self consis- 
tency and optimality of the results. Self consistency is a 


084060-9 


SIDERY et al. 


PHYSICAL REVIEW D 89 , 084060 (2014) 


requirement of any code that claims to provide reliable 
credible regions: the credible regions corresponding to a 
given confidence level must include the true source 
parameters for a fraction of signals equal to that confidence 
level. Optimality refers to an algorithm’s ability to return 
the smallest credible region among all self-consistent 
credible regions. A self-consistent algorithm need not be 
optimal. When it comes to our ability to optimize, we must 
consider both the main algorithm and the postprocessing of 
the results. 

As has been shown here, the proportion of available 
information that is utilized in the analysis can significantly 
affect the accuracy of parameter estimation. LALInference 
uses the data taken from all detectors coherently and thereby 
recovers small credible regions while staying self consistent. 
T1MING++, on the other hand, purposefully makes simplifi- 
cations, using intermediate data products from the incoherent 
analysis of individual detector data, and hence the recovered 
credible regions, even after a correction for self consistency, 
are much larger. The trade-off hes in the mntime of the 
analyses: Timing-i-i- returns a sky location within minutes 
of the completion of the search, whereas LALIneerence 
takes approximately half a day (wall time) for the specific 
waveform family and network considered here. 

Optimality is also important for the postprocessing of the 
algorithms’ output to generate marginalized PDFs and 
credible regions. A binning scheme is traditionally applied 
in which the parameter space is split into a uniform grid and 
the average density of samples in each region found. Using a 
greedy approach based on this scheme to calculate optimal 
credible regions (CRn,in), self consistency is broken [29]. For 
LALInference we have therefore implemented a more 
sophisticated way of setting up the initial bins known as a 
kD-tree so that the resolution of bins follows the density of 
the samples. A two-stage approach to ordering bins and 
estimating their contribution to the posterior is required to 
satisfy self consistency while managing to get close to 
optimality. This method will be described in full elsewhere 
[29]. While we have successfully applied this to two- 
dimensional posteriors in this study, we cannot currently 
extend this scheme to higher dimensions: the number of 
LALInference output samples required for accurate kD- 
tree PDF interpolation grows exponentially with the number 
of dimensions and so the runs become impractically long. 


While we have outlined the procedure for testing that an 
algorithm and its implementation report self-consistent 
results, it is difficult to check for optimality. One approach 
is to set up runs where the posterior PDFs are known, which 
was indeed done as part of the LALIneerence testing and 
validation [19]. By design these will be simple analytic 
functions and there is no general prescription that will test 
for all circumstances. 

The work that we have reported here, and the tools that 
we have developed and described, have already been 
important in the further development of LALIneerence. 
A new low-latency sky localization pipeline has also been 
developed [40]. It is important for future work that while we 
strive to improve on our methods in both speed and 
accuracy, we continue to validate these methods against 
the tests described here in order to have a reliable analysis 
when the next generation of detectors begins collect- 
ing data. As we move toward simultaneous and tar- 
geted electromagnetic observations of gravitational-wave 
sources, it is ever more important that sky localization be 
performed accurately and self consistently. 
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